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Abstract. Algebraic statistics for binary random variables is concerned 
with highly structured algebraic varieties in the space of 2x2x ■ • ■ x2- 
tensors. We demonstrate the advantages of representing such varieties 
in the coordinate system of binary cumulants. Our primary focus lies 
on hidden subset models. Parametrizations and implicit equations in 
cumulants are derived for hyperdeterminants, for secant and tangential 
varieties of Segre varieties, and for certain context-specific independence 
models. Extending work of Rota and collaborators, we explore the poly- 
nomial inequalities satisfied by cumulants. 
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1. Introduction 

Cumulants have a long and interesting history dating back to Thorvald N. 
Thiele, a Danish mathematician, who introduced them in 1889. See [8] for a 
historical perspective. The main motivation to study them was that multi- 
variate probability distributions are often easier to analyze when expressed in 
terms of cumulants. Moreover, cumulants are especially useful when dealing 
with the normal distribution, and hence they are a critical tool in asymptotic 
statistics (see e.g. [2, 11, 23, 26]). Various invariance properties of cumulants 
make them interesting also from an algebraic or combinatorial point of view. 
Rota and his collaborators [1, 21] developed a combinatorial theory of cumu- 
lants, and, more recently, Pistone and Wynn introduced cumulant varieties 
[16] into algebraic statistics. These concepts gave rise to umbral calculus [20], 
an approach to combinatorial sequences using cumulants. 

Building on this circle of ideas, we show how cumulants can be used 
to study algebraic varieties in tensor spaces. Thus, cumulants can be also 
used outside of the probabilistic context where we deal with sequences of 
nonncgative numbers summing to 1. Here we focus on binary states. Let 
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P = [p/]/c[n] be an n-dimensional 2x2x • • • x2 tabic of complex numbers 
such that J2pi = 1. We call such tensors distributions. In statistical contexts 
one assumes in addition that the pi are real and nonnegative in which case 
we call them probability distributions. In algebraic statistics, the probabilities 
pi form the coordinates of the ambient space containing statistical models. 
For an introduction to this geometric point of view see [3] . 

We represent the distribution P by the probability generating function 

P(x) = 

ic[n] iei 

Here [n] = {1, 2, . . . , n} and we identify our tables with functions on subsets 
of [n] . In the probabilistic context we occasionally refer to the random vector 
X = (Xx, . . . , X n ) with values in {0,1}™ and distribution P. We use here 
the natural identification of a subset / C [n] with its support vector. An 
alternative representation of P is the table of moments M = \pi]ic[n] i where 

Mi = J2 PJ - W 

JDI 

The moment generating function is a square-free polynomial in n unknowns: 
M{x) = P( Xl + l,...,x n + l) = fxijlxi. (2) 

IC[n] iei 

The logarithm of the moment generating function gives the cumulants: 

K{x) = J2 kl H Xl - ^g(M(x)). (3) 

IC[n] iei 

Note that fj,$ = 1 and = 0. For the logarithm we use the familiar series 
log(l+<) = ^) % ^ 1 t % ji. That expansion is understood modulo the ideal 

{x\, #2, . . . , The moments can then be recovered from the cumulants via 

M(x) = exp(K(x)). (4) 

The transformations (3) and (4) between moments \ij and cumulants kj 
can be written as explicit combinatorial formulas (see e.g. [11, §2.3], [21, 23]). 
Given any I C [n], let 11(7) be the lattice of all set partitions of /. We have 

ki = £ c-ijw-HH-ijin^. (5) 

The sum is over partitions of I, the product is over blocks of a partition, and 
|tt| denotes the number of blocks of tt. The moments in terms of cumulants are 

[ii = II ke for all/ C [n]. (6) 

For instance, I = {1,2,3} has five partitions 123, 1|23, 2|13, 12|3, 1|2|3, and 



^123 = M123 - M1M23 - M2A*13 - M12M3 + 2 / Ui^2M3, 

M123 = ^123 + ^12^3 + h 3 k 2 + k 2 3h + kik 2 k 3 . 
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The transformation from (6) to (5) is the Mobius inversion on the partition 
lattice n([n]), as seen in enumerative combinatorics [24, Exercise 3.44]. 

This article is organized as follows. In Section 2 we study the expres- 
sion of hyperdeterminants in terms of cumulants. In Section 3 we show that 
SX(2)™-invariant tensor varieties are defined by Z™-homogeneous polynomi- 
als in the higher order cumulants fc/ with |/| > 2. Section 4 concerns secants 
and tangents of the Segre variety, and we show (in Theorem 4.1) that the 
tangential variety becomes toric in cumulant coordinates. A conceptual ex- 
planation for this arises from our theory of hidden subset models, developed in 
Section 5. Here the main result is Theorem 5.1. Section 6 offers an algebraic 
study of the context-specific independence models due to Georgi and Schliep 
6]. Section 7 explores the semialgebraic constraints on cumulants arising from 
probabilities, and it addresses a conjecture proposed in [1]. 



2. Hyperdeterminants 

One of the most intriguing polynomial functions on 2 x 2 x • • • x 2-tables is 
the hyperdeterminant Dct(P), which is a generalization of the determinant 
of a 2 x 2 matrix. The hyperdeterminant, first introduced by Cayley in 1843, 
has many equivalent definitions (see [5]). One of them states that Det(P) 
is the (unique up to scaling) irreducible polynomial in the pi that vanishes 
whenever the complex hypersurface defined by the equation P{x) = has a 
singular point in C™. Algebraically, the hyperdeterminant Det(P) is obtained 
by eliminating the n unknowns Xx, x%, . . . , x n from the n + 1 equations 

dP dP BP 

P(x) = —(x) = — (x) = ■ ■ ■ = — (x) = 0. 
ax\ 0x2 ux n 

According to [ , §14.2], Det(P) is a homogeneous polynomial of deg ree C n in 
the 2™ unknowns, where X^^Lo C n z n /n\ = e~ 2x /(l — x) 2 . So, the degrees of 
our hyperdeterminants are C2 = 2, C3 = 4, C4 = 24, C5 = 128 etc. 

We work in the (2" — l)-dimensional affine space of distributions defined 
by YI1P1 = 1' or M = 1- W e seek to express the hyperdeterminant on 
that affine space in terms of the cumulants fc/. From such an expression one 
recovers a formula for Det(P) in terms of the original coordinates pi, up to 
scaling, by using (1) and (5). 

If n = 2 then the hyperdeterminant is the determinant of a 2 x 2-matrix, 



P = 



P<$ P2 
Pi Pl2 



In statistics, this represents the independence model for two binary random 
variables, and we recover the well-known fact that independence is equivalent 
to vanishing of the covariance 



Det(P) = p\iP%~P\P2 = ^12-^1^2 = k 



12- 



The statistical meaning of larger hyperdeterminants will be discussed later. 
See, in particular, the context-specific independence model in Example 6.2. 
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If n = 3 then, by [5, Proposition 14.1.7], the hyperdeterminant equals 

Det(P) = MlA 1 23+/ i 2Ml3+/ i 3A 1 12+Ml23 + 4 (A*lM2Ai3Ml23 +M12M13M23) 
-2(^i^2Atl3M23+MlM3A i 12M23+M2M3A i 12A i 13+MlM23Ml23+M2A i 13Ml23+M3Ml2Atl23) 

Here we can use either fXj or pj since Det(P) is SL(2) 3 -invariant. The formula 
simplifies considerably after we replace moments by cumulants via (4) or (6): 

Det(P) = k 2 l23 +Ak 12 k l3 k 23 . (8) 

This 2 x 2 x 2-hyperdeterminant is also known as the tangle, and it appears 
in phylogenetics [25], quantum computation [12] and string theory [ ]. 

The next case n = 4 is much more challenging. According to Huggins et 
al. [10], the 2x2x2x 2-hyperdeterminant has precisely 2, 894, 276 terms, when 
written as a polynomial of degree 24 in either probabilities pi or moments \xi. 
However, the expansion of Det(P) in terms of cumulants k[ is much smaller. 
The following theorem is our main result in this section. 

Theorem 2.1. The 2x ••• x2 -hyperdeterminant Det(P) is a polynomial func- 
tion in the 2™ — n — 1 higher cumulants {fc/ : |/| > 2}. It is homogeneous 
of degree \{C n ,C n , . . . ,C n ) in the IT 1 -grading given by deg(fcj) = J2iei e i> 
where e, is the i-th unit vector ofZ n . For n = 4, the hyperdeterminant Det(P) 
has precisely 13,819 monomials in the 11 unknowns kj, all Z 4 -homogeneous 
of degree (12, 12, 12, 12), and their total degrees range from 24 to 15. 

Proof. The expression of the hyperdeterminant in terms of the moments [ij 
coincides with the A- discriminant (cf. [5]) of the moment generating function 

M(x) = ^Ii Xi = c MK(x)). (9) 

ic[n] iei 

Here A is the (n + 1) X 2™ matrix whose columns are the homogeneous 
coordinates of the vertices of the standard n-cube. Standard results on A- 
discriminants ensure that Det(P) is homogeneous in the Z™ +1 -grading spec- 
ified by A, so, in particular, it is homogeneous in the coarser Z n -grading 
given by deg(/xj) = J2iei e i- Since the degree of Det(P) in the standard Z- 
grading deg(/zj) = 1 equals C„, as discussed above, we find that Det(P) is 
Z n -homogeneous of degree \{C n , C n , . . . , C n ). 

The map (6) from moments to cumulants respects the Z"-grading, and 
we conclude that the expansion of Det(P) in cumulants is Z n -homogeneous 
of the same degree \{C n , C n , . . . , C n ). The first assertion that Det(P) does 
not depend on the first order moments k\, . . .k n follows from Theorem 3.2. 

We now come to the specific case n = 4. Here the proof was carried out 
by a computer calculation. We first set fci, k%, k% and foj to zero in the right 
hand side of (9) since Det(P) does not depend on these first-order cumulants. 
Our task is then to evaluate the ^4-discriminant of the multilinear polynomial 

M(x)\ kl= k 2 =k 3 =k i =0 = (&1234 + k 12 k 34 + ki 3 k 2 4 + k 14 k23)xiX2X 3 X4: 

+ k 123 xix 2 x 3 + k 124 xiX 2 X4 + ki 34 xix 3 x 4 + k 23 4X 2 x 3 X4 
+ k 12 xix 2 + k 13 xix 3 + k 14 xiX4 + k 23 x 2 x 3 + k 24 x 2 x 4 + k 34 x 3 X4 + 1. 
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This computation is done using Schlafli's formula [10, Prop. 3]. We obtained 

256/l^2 ^13 ^14^''23 ^24 ^34 10^^-^''12^''13^14^''23^''24^''34 -1^^^^"'12^'13^''14^'23^24^''34 

■ ■ ■ many terms • • • — ^34^123^124^134^234^1234 + ^123^124^134^234^1234 ■ 
This expansion of Det(P) has 13819 terms, all of Z 4 -degree (12,12,12,12). The 
leading terms have total degree 24. The last terms have total degree 15. □ 

Ideals generated by hypcrdctcrminants arise in various applications. We 
advocate writing these in terms of cumulants. One such application, studied 
by Holtz-Sturmfels [9] and Oeding [14], concerns the relations among prin- 
cipal minors of a general symmetric nxn-matrix A. If we write /x/ for the 
minor with row and column indices J C [n], and we treat the sequence [/x/] 
as a sequence of formal moments, then the corresponding moment generating 
function takes the special form 

M(x) = det(J + AX) where X = diag(xi, ... ,x n ). 

Oeding [14] shows that the variety of such tables M = [/ij] is cut out by poly- 
nomials of degree 4. These polynomials are obtained by acting with the group 
SL(2) n on the 2 x 2 x 2-hyperdeterminants of all subtables. We reparametrize 
our variety of principal minors using the cumulant generating function: 

™ (-l) k+1 

K(x) = logdet(J+AX) = trace log(J+ AX) = trace(V - — '- (AX) k ). 

k=l 

The coefficients ki of the squarefree terms are sums over all cycle monomials 
in A that are supported on /. Their algebraic relations can be computed more 
easily than those among the principal minors. We demonstrate this for n = 4: 

K { x ) = T,IC[4] k lUi£l x i 

^-^i — l ^ii^i Z^i<j Q>ijXiXj ~\- 2 <k ® J ij®ikQ'jkXiXjXk 

-2(ai 2 ai 3 a24a34 + 012014023034 + ai 3 ai4a23024)iEi2;22;3£4 
The prime ideal of algebraic relations among the coefficients is found to be 

(4k 12 k 13 k23 + fc?23, 4fci2fcl4&24 + fc 124, 4/ci 3 fci 4 /C34 + kf 3i , 4^23^24^34 + &234, 
4^12^13^14^234 + fcl23fcl24fcl34 , 4^12^23^24^134 + ^123^124^234, 
4^13^23^34^124 + ^123^134^234 , 4^14^:24^34^123 + ^124^134^234, 
2fcl2/Cl3&234 + 2fcl2fe23fcl34 + 2fci 3 /C23&124 + ^123^1234, 
2^12^14^234 + 2^12^24^134 + 2^14^24^123 + ^124^1234, 
2fci3fci4fc234 + 2fci3fc34&i24 + 2fci4/C34fci23 + ^134^1234, 
2^23^24^134 + 2^23^34^124 + 2^24^34^123 + ^234^1234, 

— 2fcl2fcl3fei4fcl234 + ^12^13^124^134 + fcl2fcl4fcl2 3 fcl34 + ^13/014^123^124, 

— 2fci2fc23fe24fel234 + ^12^23^124^234 + ^12^24^123^234 + ^23^24^123^124, 

— 2fci 3 fc23fe34fel234 + ^13^23^:134^234 + ^13^34^123^234 + ^23^34^123^134, 

— 2^14/224^34^1234 + ^14^24^134^234 + ^14^34/0124/2234 + &24/C34/i:i24fcl34, 
fcl4fel23fc234 — ^23^124/2134 , fcl3fcl24/C234 — ^24^123/5134 , /C12&134/C234 — ^34^123/0124, 

4(/Cl2fcl3/C24/C34+fcl2/Cl4A:2 3 fe34-|-fcl3fel4A:23fc24) 

— 2(fei4/Ci23fc234+fe24fcl23fel34+fe34fcl23fel24)— ^1234) 

These twenty polynomial correspond to the hyperdeterminantal relations in 
[9, Thm. 8]. They furnish a compact encoding of this codimension 5 variety. 
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3. Invariance and Independence 

The algebraic relations in Section 2 did not involve any of the order one cumu- 
lants fci, . . . , k n and they were homogeneous with respect to the Z™-grading 
given by deg(fc/) = Yli£i e i- I n this section we argue that these properties 
hold for all statistically meaningful varieties in the space of 2x • • • x2-tables. 

To compute moments in (1) we used the convention that the (formal) 
random vector X = (Xi, . . . ,X n ) has values in {0, 1}™. Other authors pre- 
fer the choice {—1,1}™, and this leads to rather different formulas for the 
moments (see [1, Equation (2.1)]). A meaningful statistical model will not 
depend on such choices. Hence we are only interested in cumulant varieties 
that do not depend on such choices. 

Suppose we replace each of our random variable Xi by a new variable X[ 
which takes values and bi instead of 1 and 0. If the probability distribution 
is the same on both state spaces, then the cumulants are transformed via 

k'j = hi- Y[(a % - h) for all I C [n] and |7| > 2 (10) 

and k[ = (eij — bi)ki+bi for i = 1, . . . , n. This result is purely algebraic and the 
above remains true if we replace probability distributions with any complex 
distributions. In geometric language, changing the values of the binary vari- 
ables Xi corresponds to a natural action of the n-dimensional torus (C*) n 
with coordinates ai — bi on the space C 2 ~ n_1 whose coordinates are the 
higher cumulants k], \I\ > 2. This action is compatible with the Z n -grading: 

Theorem 3.1. A subvariety o/C 2 _1 is invariant under changing values of the 
Xi if and only it is defined by Z n -homogeneous polynomials in ki with \I\ > 2. 

Proof. Let V be a subvariety in the space C 2 ™ -1 whose coordinates are all 
the cumulants. Suppose that V is invariant under replacing the values (0, 1) 
of Xi by any (bi,ai). If the new values satisfy ai = bi + 1 then the higher 
cumulants fc/, |/| > 2, remain unchanged but the vector (k\, . . . , k n ) is shifted 
to (fci + b\, . . . , k n +b„). Hence the ideal ly of V is generated by polynomials 
that do not depend on linear cumulants k\,...,k n . By fixing bi = and 
moving a^, we see that V is invariant under the torus action (10). Hence 
its ideal ly is Z™-homogeneous, and this proves the only-if direction. The 
if-direction holds by essentially the same argument. □ 

The group SL(2) n acts on the tensor space C 2x2x ' x2 an d many impor- 
tant varieties are invariant under this action. In particular, they are invariant 
under U(2) n where U (2) is the unipotent group of 2 x 2-matrices of the form 

J * for A e C. 

The invariance property of Theorem 3.1 reflects precisely the SX(2)™-invariance. 

Corollary 3.2. Let V be a subvariety of the affine open subset {/xg = 1} in the 
projective space p(C 2x2x ' x2 ) an d l e t V denote its closure in that projective 
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space. If V is invariant under the action of SL(2) n then the ideal Jy that 
defines V is generated by Z™ -homogeneous polynomials in the kj with \I\ > 2. 

Proof. The unipotent group U(2) n acts on the moment generating function 
via M(x) H> M(x) r[i=i(l + Xix{). Modulo the ideal {x\, . . . , x^) we have 

n n n 

M(x) JJ(1 + XiXi) = M{x)exp(^2\ l x i ) = exp(K(x) + ^ X&i). 

i—1 i—1 i—1 

This means that U (2)™ acts on the space of cumulants by shifting the first 
order cumulants. We conclude that the prime ideal of V is generated by 
polynomials in the cumulants ki with |7| > 2. Since V is also invariant under 
tuples of 2 x 2-diagonal matrices in SL(2) n , these ideal generators can be 
chosen to be Z ra -homogeneous. □ 

Hyperdeterminants and their ideals in Section 2 are 5'L(2) ra -invariant 
and hence expressible by Z™-homogeneous polynomials in higher cumulants. 

Example 3.3. The converse does not hold in Corollary 3.2. Fix n = 3, let p E 
C\{4}, and consider the hypersurface in = 1} c P(C 2x2x2 ) defined by 

km + P ■ ki 2 kizk 2 z = 0. 

This equation has degree six when written in the (homogenized) moments: 

Det(P)mg + (p — 4)(m mi2 - TOim 2 )(m TOi 3 - mim 3 )(rn m 2 3 - m 2 m 3 ) 

This defines a sextic hypersurface in P(C 2x2x2 ) that is J7(2) 3 -invariant but 
not S L(2) 3 -invariant. The formula in probabilities is even less invariant: 

Det(P) ■ (p$ + pi + P2 + P3 + Pi2 + P13 + P23 + P123) 2 

+ {ft - 4) (P0P23 + VHV123 + P1P23 + P1P123 - P2P3 - P2P13 ~ P3P12 - P12P13) 
■(Wl3 + P$P123 ~ PlP3 - P1P23 + P2P13 + P2P123 ~ P3P12 - P12P23) 
■{P$Pl2 + P$Pl23 -PlP2 -PlP23 - P2P13 + P3P12 + P3P123 ~ P13P23) 

Of course, for p = 4, this is the hyperdeterminantal quartic {Det(P) = 0}. □ 

The most basic statistical model for n binary random variables Xi is 
the model of complete independence, denoted X1ILX2-LL . . . ALX n , which is 
the Segre variety (P 1 )™ C P 2 _1 . In terms of moments, this is parametrized 
by M(x) — H"=i(l Pi x i)- I n terms of cumulants, we obtain K{x) = 
Y^i=i l°g(l + PiXi) — J27=i^i x i- ^ n probability coordinates, the Segre va- 
riety is defined by certain 2x2-determinants pipj — pkPl but we see that 
this simplifies when we use cumulants as coordinates: 

Remark 3.4. The Segre variety is defined by kj = for all \I\ > 2. 

The Segre variety is the intersection of the independence models AALB 
where A\B runs over all partitions of the set [n]. The equations for AALB are 
ki = for all I with A n I ^ and B n I ^ 0. The model AALB also makes 
sense when A U B is a proper subset of [n], with equations as follows. 
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Proposition 3.5. If A and B are disjoint subsets of[n] then the independence 
model AALB is defined by ki = where I C AUB, ACM ^ $ andBnl ^ 0. 

Proof. The independence model AJLB has the moment parametrization 

M{x) = Mi(Xi :i € A) ■ M 2 {x j : j € B) + ^ x r Ni(x). 

l£[n]\(AuB) 

By taking the logarithm, we find that 

K(x) = \og(M 1 (x l : i e A)) +log(M 2 (x j : j G B)) mod (xf.le [n]\(AUB)). 
This form is equivalent to the asserted vanishing condition on cumulants. □ 

The symmetry group of the n-cube is the semidirect product of the 
symmetric group S n , which permutes [n], and the abelian group ZJ, which 
swaps Os and Is. This gives rise to an action on C 2 x2x - x2^ ^ e id en tify 
elements p € Z£ with subsets J C [n]. The action on coordinates pj is 
as follows: for a £ S n we have <j(j>i) — *p a m, and for J C [n] we have 
p.j(pi) — Piaj, where IAJ = (I\J) U (J\I). This being an action ensures 

Pj(pi) = Pir ° • • • for a11 J = O'li ■ ■ ■ >3r} Q [n]. (11) 

We extend this action from probability coordinates to any of their poly- 
nomials in a natural way. In this way we extend this action to cumulant coor- 
dinates. This action is simple for permutations a G S n : we have a(kj) = k a m. 
The action of the group Z£ is more subtle, and it can be characterized by the 
following corollary. That result will help us in Section 7 to get a more compact 
semialgebraic description of the space of cumulants, by taking advantage of 
the symmetries in our problem. 

Corollary 3.6. Consider the cumulants kj as polynomials in probabilities pi, 
via (1) and (5). For I, J C [n] with \I\ > 2, the element pj € satisfies 

= ( */J Jn/ l lS0dd ' (12) 

{ Ki otherwise. 

Furthermore, for each i = 1, . . . , n, we have 

(k ) = I 1 ~ kt ij ie J > 
1 ki otherwise. 

Proof. By (11) it suffices to show pi(ki) = —kj if i € / and pi(kj) = kj if 
i £ I. Formula (12) follows from (10) by taking m = and bi = 1, i.e. we swap 
the states of the ith variable X^, and similarly for first-order cumulants. □ 

4. Tangents and secants of the Segre variety 

In Remark 3.4 we saw that the Segre variety (P 1 )™ C P(C 2x "' x2 ) collapses 
to a single point in the space C 2 " - ™ -1 of higher cumulants. This raises the 
question what the representation in the kj with |/| > 2 looks like for varieties 
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naturally associated to (P 1 )™, such as its secant and tangential varieties. We 
here examine the first tangential variety and the first secant variety: 

Tan((P 1 ) n ) = closure of { x £ P 2 "" 1 | x lies on a line tangent to (P 1 )™}, 
Sec((P 1 )™) = closure of { x £ P 2 "" 1 | x lies on a secant line of (P 1 )"}. 

Our next result reveals that the tangential variety is toric in the cumulants. 

Theorem 4.1. The image of the tangential variety Tan((P 1 )™) in the space of 
higher cumulants C 2 _n— 1 is isomorphic to the n- dimensional affine toric 
variety parametrized by all square-free monomials of degree > 2. 

Proof. In the tensor notation of [13], Tan((P 1 )") has the parametrization 

n 

M = - V a (1) ® • • • ® a (i_1) ® 6^ ® a (i+1 > ® • • • ® a< n \ 
n ^ 

i=l 

where aM' — (l,o,-) and &M = (1,6,) are vectors representing points in the 
distinguished affine open subset of P 1 . The formula above translates into the 
following parametrization of moment generating functions: 

j — 1 x i— 1 7 

We compute the logarithm of the series M(x) modulo {x\ 1 x 2l ■ ■ ■ , x 2 )- Dis- 
regarding M-linear combinations of x\, . . . , x n , and setting Sj = (eij — bi)/n, 

^ 1 + a ^ ife tei 

The identity on the right can be proved directly by manipulating generating 
functions. An alternative and more detailed proof will be given in Example 
5.2. We now conclude that 

kj = (-ip- i (\i\-iy--U s i for i j i ^ 2 - ( 15 ) 

This monomial parametrization shows that Tan((P 1 )") is toric in cumulants. 
It is isomorphic to the toric variety with parametrization k[ t—¥ Yiiei s i- '— ' 

We easily find the cumulant ideal of Tan((P 1 )™), by computing a Markov 
basis for the toric ideal of relations among squarefree polynomials of degree 
> 2. We then rescale to adjust to the signs and factorials appearing in (15). 

Example 4.2. Let n — 5. Then the toric ideal Tan((P 1 ) ra ) is minimally gener- 
ated by 120 binomials in the 26 cumulant coordinates. Among these genera- 
tors, 75 are quadrics and 45 are cubics. The quadrics include binomials such 
as k 12 k 34 -k u k 23 , k 123 k i5 -k 12 k 3 4 5 , ^123^345-^135^234, ^1234 +&kuk 2 3, and 
^12345 + 12^12^345. The cubics include binomials such as fc 2 23 + ^k\ 2 k\ 3 k 23 

and ^123^124 + 4^12^14^23. D 
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We now come to the secant variety Sec((P 1 )™). This is not a toric variety 
in cumulants. For example, for n = 4 it has the following parametrization: 

M = {l-t)A®B®C®D + tE®F®G®H, 
M{x) = (1 -t)(l+axi) (l+bx 2 ) (l+cx 3 ) (l+dz 4 ) 
+ t(l+ex 1 )(l+fx 2 ){l+gx 3 ){l+hx i ). 

Here A — (1, a), . . . ,H = (1, ft.), and t is a complex mixing parameter. 

The image of Sec((P 1 )™) in the 11-dimensional space of higher cumulants 
is a 5-dimensional afhne variety that is not toric. Its ideal is generated by 16 
polynomials in fci2, fci3, . . . , ^1234- These are the ten binomial quadrics 

^12^34 - &14&23, &13&24 ~ k u k 23 , ^12^134 ~ ^14^123, ^13^124 ~ ^14^123, 

^12^234 -^24^123, ^23^124 ~ ^24^123 , ^13^234 ~ &34&123 , (16) 
^23^134 - ^34^123 , &14&234 ~ ^34^124 , &24&134 ~ &34&124, 

and the six non-binomial cubics 

k 12 L ~ (k 123 k 124 + 4fc 12 fci 4 /c23), ^13^ - (^123^134 + 4^13^14^23), 

k 14 L - (k 124 k 134 + 4fci 4 fci 4 /c23), k 23 L - (^123^234 + 4/^23^14^23), (17) 

k 24 L - (fci 24 fc 2 34 + 4fc 2 4fcl4&23), h 4 L - (&134&234 + 4^34^14^23). 

Here L = fci234 + 6fci4fc23 is one of the toric relations on (15). Indeed, the tan- 
gential variety Tan((P 1 ) 4 ) is a hypersurface in the secant variety Sec((P 1 ) 4 ). 
Its toric ideal in cumulants has 21 minimal generators, namely, the ten 
quadrics in (16), the quadric L, the six parenthesized cubics in (17), and 
the four hyperdeterminants 

fc 234 +4^23^24^34 , k\ 34 + 4fci 3 fci 4 A:34 , k\ 2i + -ik 12 k 14 k 24 , k\ 23 + Ak l2 k 13 k 23 . 

These various equations in cumulants can now be translated back into prob- 
ability coordinates, using the substitutions (1) and (5). After homogeniz- 
ing and saturating with /ig, we recover the 32-dimensional space of 3x3- 
minors of Battenings for Sec((P 1 ) 4 ), as in [17], and the 53 ideal generators for 
Tan((P 1 ) 4 ), namely, the 32 cubics, the 20 hyperdeterminantal quartics, and 
the special quadric as in [13, §3.2]. 

For any n > 4, the secant variety Sec((P 1 )™) is a curve over the toric 
variety Tan((P 1 )™). Using cumulant coordinates, it has the parametrization 

ki = K\i\(t)-Y[b l: (18) 

where hi are complex parameters and k v (t) is a certain univariate polynomial 
of degree v\ see (23). For example, 

K 2 (t) = ~t 2 +t 

K 3 (t) = 2t 3 - 3t 2 + 1, (19) 
K4,(t) = -6t 4 + 12t 3 - It 2 + t. 

The leading coefficient of K„(t) equals (— 1) I/_1 (^— 1)! in the parametrization 

(15) of the tangential variety. Using (19), we can now recover the equations 

(16) and (17) of the secant variety by implicitizing (18) for n — 4. The 
derivation of (18) and the polynomials Knit) will be explained in Example 5.3. 
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5. Hidden subset models 

We now introduce a highly overparametrized algebraic statistical model for a 
vector X of n binary random variables. It is called the complete hidden subset 
model. Here is a generative description of this model. A subset / of [n] (or 
alternatively a binary vector) is to be chosen at random. For each element 
i £ [n] we need to decide whether i is in J or not. This is done as follows. 
First, a hidden subset J is chosen with some probability tj. Then we select 
i for / with probability a\ ^ if i $ J, and we select i for / with probability 
a\ 1 ' if i € J. The conditional probabilities = Prob(i <G I \ i ^ J) and 
afp = Prob(i € I\i € J) are unrelated parameters that govern this process. 
The distributions in this model are parametrized as follows: 

m = 5> n ( i -^ 0) ) n t 1 -^ 1 ') n ^ n 

jc[n] ie/ c n,/ c iei°nJ ieinJ" ie/nj 

where I c denotes the complement of / C [n]. The corresponding moment 
generating function has the parametrization 

M(x) = *' ■ II 0- + Eft 1 + fl ' 1) ^)- ( 2 °) 

JC[n] i£J c ieJ 

The model has two parameters a*- and a,j , with values between and 
1, for each j G [n]. Further, it has 2" mixing parameters tj, one for each 
subset I C [n]. These parameters are non-negative and they sum to 1, so the 
tables T — (tj)jc[ n ] is also a distribution. We write kj for the cumulants 
obtained from the table T. 

Our main result in this section is the following intriguing theorem. 

Theorem 5.1. The complete hidden subset model is parametrized in terms of 
cumulants by fc, = af 1 ^ + &, • fcj , where bi = — a,- ' for i = 1, . . . , n, and 

ki = 4 t} -I]> f° r W > 2 - ( 21 ) 
Proof. We introduce a homogeneous probability generating function as 

JC[n] ieJ° ie,7 

so that P(x) = Ph om (l,x). Then the moment generating of X in (20) can 
be dually treated as the homogeneous version of the probability generating 
function of Y. Namely, for fixed a,- ' and a[ , we write (20) as M(x) = 
Phomiy^Ky^), where y^ = 1 + a^Xi and y^ 1 ' = 1 + a- 1 Xi. From the 
homogeneous generating function P^om we can obtain the homogeneous mo- 
ment generating function similarly as in the first equation in (2). Thus 
setting Zi = y^' — yf \ we have 

PSLiv^.V™) = p£L(V (0) ,* + V (0) ) = MgjyMz). (22) 
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From this we find that M(x) = P^ m (y^°\y^) is equal to 

JC[n] ieJ ieJ" 

Since z { = y^ - yf^ = b^ and yf^ — 1 + af^Xi, then M^ m {y^,z) can be 
rewritten as a function of x\, . . . , x n only: 

= M<*> (Mi, . . . , b n x n ) nr =1 (l + af^i)- 
The last equality follows holds modulo the ideal (xf, . . . ,x%). This implies 
if (x) = K^(b\Xi, . . . , 6„x„) + J2i=i a i x % an d hence fcj = a- ^ + frjfcj* ' for 
i = 1, . . . , n, and kj = kf Yiiei ^ f° r ever y I Q M with |/| > 2. □ 

A hidden subset model is any submodel obtained from (20) by setting 
some of the mixing parameters t[ to zero. Thus a hidden subset model for 
n binary variables is specified by a collection {Ii, ...,//,.} of subsets of [n]. 
These subsets indicate those mixing parameters ti t , . . . , tj h that are not zero. 
We next show that the two varieties in Section 4 arise as special cases of this. 

Example 5.2. The hidden subset model {{1}, . . . , {n}} is given by 

This equals the tangential variety Tan((P 1 )") as in (13) but now with toric 
parameters s, = ti{a^ — af 1 '). We compute the cumulants kf of the mixing 
distribution T using (5). The moments of T satisfy = t% and fij = for 
|/| > 2. This means that the sum in (5) has only one non-zero term, the one 
corresponding to the partition tt of / into singleton blocks. Now, (5) reads 

kf = (-i)i'M(iii-i)in* 

iei 

We have shown that the formula (21) specializes to (15) for this model. □ 

Example 5.3. The hidden subset model {0, [n]} is the mixture of two inde- 
pendent random variables, so it coincides with the secant variety Sec((P 1 ) n ). 
The mixing distribution T has one free mixing parameter t, where t\... n = t 
and t$ = 1 — t. The moments of T are /l«0 = 1 and fjg = t for \B\ > 1. The 
formula (5) implies 

\i\ 

i=l 

where fiij is the number of set partitions of I into i blocks. This univariate 
polynomial depends only on the cardinality v = We can also write it as 

V 

= EH)*" 1 -^-* 1 (23) 
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where jij is the number of cyclically ordered set partitions of a v-set into i 
blocks. Such partitions arc known as necklaces in enumerative combinatorics. 

□ 

Example 5.4 (Binary Hidden Markov Model). The complete hidden subset 
model includes all hidden Markov models (HMM) where both hidden and 
observed states are binary. These models are widely used in computational 
biology [15, §1.4.3 and §11]. We can treat the mixing variable with distri- 
bution T as a hidden binary process Y = (Yj., . . . , Y n ). The parameters a|°^ 
and a!f^ determine the conditional distribution of X{ given the hidden pro- 
cess, and this observed distribution depends on Y only through the value of 
Yj. In this context the parameters 6j = a^' — are the linear regression 
coefficients of with respect to Yj. For an HMM, the hidden distribution 
T follows, in addition, a homogeneous Markov chain [15, §1.4.2]. Thus, if 
fcj are the cumulants of the homogeneous Markov chain, then (21) gives a 
parametrization of the binary HMM. It would be interesting to revisit the re- 
cent work of Schonhuth [22] from this perspective. We expect his prime ideals 
J3 n in [22, §7.3] to have a nice representation in terms of cumulants. □ 

The set of all hidden subset models, for fixed n, forms a poset whose 
elements Ma are indexed by the 2 2 " subsets A of 2™. The model Ma is 
obtained from the complete hidden subset model by setting tj = for all 
I C [n] not in A. Of course, different labels A and B can lead to isomorphic 
hidden subset models Ma and Mb- Clearly, this happens if B is obtained 
from A by a permutation of [n]. But, in fact, the full symmetry cube of the 
n-cube acts on the hidden subset models: 

Proposition 5.5. Let A,B<Z 2^ and assume that B is equal to JAA = 
{/Act : a € A} for some subset J C [n\. Then Ma is isomorphic to Mb- 

Proof. By Theorem 5.1, the two models are parametrized by kj = k^ Yiiei 
The cumulants kj depend on A and B. By Corollary 3.6, if B — JAA for 
some J, then the respective cumulants kj for A and B agree up to sign. □ 

Each hidden subset model Ma can be identified with a 0/1-polytope 
Pa- By Proposition 5.5, if Pa and Pb are 0/1 equivalent then Ma and Mb 
are isomorphic. We say that the model Ma is non- degenerate if the polytope 
Pa is not contained in any hyperplane Xj = or Xj = 1 for i = 1, . . . , n. If this 
happens then the random variable Xi is independent of all other variables. 
Geometrically this means that the variety of Ma decomposes as a product 
of P 1 and a smaller hidden subset model. 

If n = 2 then, up to the symmetry of the 2-cube, there are precisely 
three distinct hidden subset models which are non-degenerate: {0, 1, 2, 12}, 
{0,1,2}, {0,12}. Their models Ma all parametrize the full tetrahedron A3 
of distributions on 2^ or, in algebraic terms, the whole projective space P 3 . 

If n = 3 then, up to symmetry of the 3-cube, there are precisely 19 
collections A of subsets of {1,2,3} with 2 < \A\ < 7. Thirteen of these 19 
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models Ma have codimension 0, that is, they are full-dimensional in the 
simplex A7 of probability distributions on 2^. One of these sets is A = 
{0, 123} which represents Ma = Sec((P 1 ) 3 ) and hence fills P 7 . The remaining 
six of the 19 models Ma represent three distinct varieties. The first of them 
is the hyperdeterminantal hypersurface Tan((P 1 ) 3 ). The other two varieties 
are a line and a point in cumulant space: 



hidden subset model 


variety 


codimension 


{0,1,2,3} or {0,12,13} 
{1,2} or {0,1, 2} or {0,1, 2, 12} 

{0,1} 


fc i23 + 4A:i 2 fci 3 fc23 = 
(fc 12 , fei 3 , fc 23 , fci 23 ) = (0, 0, 0, *) 
(ki2,k 13 ,k 23 ,k 123 ) = (0,0,0,0) 


1 

3 
4 



The first row corresponds to non-degenerate models Ma that do not fill A7. 
The situation becomes more interesting for n > 4 when we get a vast range 
of new models. Some of these will be discussed and catalogued in Section 6. 



6. Context-specific independence 

This section concerns a class of statistical models that has proved to be use- 
ful in machine learning and computational biology [6], namely, the context- 
specific independence (CSI) for binary random variables. It has been observed 
in [13, §6.3] that both the tangential variety and the secant variety of the 
Segre variety are CSI models. Examples 5.2 and 5.3 expressed these as hid- 
den subset models. We here generalize this relationship by identifying the 
class of binary CSI models with a natural class of hidden subset models. 

The formal specification of a CSI model is as follows. Fix a multiset of 
n partitions {tt±, 7r 2 , . . . , 7r„} of the set [m] = {1, . . . , m}. The model is 

m 

)• (24) 

i=i 

Here TTj(k) is the block of the j-th partition ttj that contains the class k, and 
t\, . . . , t m are mixing parameters for the classes. These satisfy ti+- • •+t m = 1. 

If each TTj is the partition into singletons, then we can write 7Tj(j) = j in 
(24) and the CSI model is the m-th secant variety of (P 1 )" in P 2 "" 1 . This is 
known in statistics as mixture of n independent binary vectors or as the naive 
Bayes model. Hence every CSI model with m hidden classes is a submodel of 
the naive Bayes model. 

The CSI model has X)"=i Wi\ + m ~ 1 parameters and the dimension 
of the ambient space is 2™ — 1. It is usually not identifiable, meaning its 
dimension is smaller than the number of parameters. However, identifiability 
does hold for rn = 2. Here the CSI model is the product of the Segre variety 
(P 1 ) n_fc and the first secant variety of (P 1 )^, where k = #{j e [n] : wj = 1|2}. 
This is the graphical model represented by a directed star tree with a hidden 
binary variable and k leaves together with n — k isolated nodes. 

From statistical point of view it is sensible to assume the following: 

(Al) All partitions in the model specification have at least two blocks. 
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(A2) There is no pair of elements {i,j} such that for every partition in the 
model specification both i and j are in the same block of this partition. 

If (Al) is violated then one random variable is independent of all others. 
Taking an appropriate margin, we can constrain our analysis to the remaining 
variables. If (A2) does not hold then the classes i and j can be joined to 
form a single class without changing the model. If m — n = 3 then up to 
symmetry we have three CSI models satisfying (Al) and (A2). The first case is 
7Ti = 1 1 23 7T2 = 2 1 13, and tt^ = 3| 12. This is precisely our hyperdeterminantal 
hypersurface Tan((P 1 ) 3 ) = V(kf 23 + 4:k 12 k 13 k 2 3) . The other two CSI models 
represent all distributions on 2^; 

(tti = 1|23, tt 2 = 12|3, ir 3 = 1|2|3) or (vn = 1|23, tt 2 = tt 3 = 1|2|3). 

In the remainder of this section we study a special class of CSI models. 
Namely, we shall require that each partition Hi has precisely two blocks. We 
call these models the CSI split models. Thus a CSI split model is represented 
by a collection 7r 2 , . . . , 7r„} of splits of the set [m] of hidden states. The 
following result identifies these models with the models in Section 5. 

Proposition 6.1. The CSI split models are precisely the hidden subset models. 

Proof. Let Ma be the hidden subset model defined by A = { Ji, . . . , J m } C 
2H. This is written as a CSI split model with m hidden classes by taking 
the n partitions tti, . ..,7T« of [m] to be Wi = I\I C , where £ £ I whenever 
i G Je- Conversely, suppose we are given a CSI split model {itx, . . . , 7T n }. 
Then we regard Hi as an ordered partition, and we recover the m subsets in 
A by taking to be the set of all i £ [n] such that £ is in the first part of 
7Tj. These transformations lead to identical parametrizations, and hence the 
corresponding models in A 2 n_i coincide. □ 

We classified all hidden subset models and hence all CSI split models 
for n = 3 in the end of the previous section. The next case n = 4 is much 
more interesting, as it offers a considerably wider range of possibilities. The 
classification for n — 4 will occupy us in the rest of this section. 

Example 6.2 (The hyperdeterminant as CSI model). Let n = 4, m = 7, and 

consider the hidden subset model Ma where A = {0,12,13,14,23,24,34}. 
The corresponding CSI split model is {1234|567, 1256|347, 1357|246, 1467(235} 
In algebraic geometry, the model Ma corresponds to the second osculating 
variety of the Segre variety (P 1 ) 4 . This is a hypersurface of degree 24 in P 15 , 
namely, it is the hypersurface defined by the 2x2x2x2-hyperdeterminant. 
This result was pointed out to us by Luke Oeding and Giorgio Ottaviani, and 
we can easily verify it by a direct computation. The fact that codim(A / lyi) = 1 
is verified by computing the rank of the Jacobian of the parametrization (21) 
for random parameter values. The fact that Ma equals {Det(P) = 0} is 
verified by plugging the parametrization (21) into the formula with 13819 
monomials found in Theorem 2.1. We note that this model remains the same 
if we augment A to {0, 1, 2, 3, 4, 12, 13, 14, 23, 24, 34}. □ 
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Hidden subset model 


CSI 


split model 


codimension 


degree 


10,12,13,14} 


1| 234, 2| 134, 3j 124, 4| 123 


7 


20 


10, 12, 13,4} 


14|23,2 134,3 124,4 123 


6 


29 


{0,1,2,34} 


2|134,3 124,4 123,4 123 


6 


29 


{0,1,23,234} 


2 134, 34|12,34|12,4 123 


6 


23 


{0,1,234,1234} 


24|13,34|12,34|12,34|123 


6 


23 


{0,1,2,134} 


13| 24, 3| 124, 4j 123, 4| 123 


5 


44 


{0,1,12,234} 


23|14, 34 


12,4|123,4|123 


5 


44 


{0,1,123,234} 


23 14, 34 


12, 34|12,4 123 


5 


44 


{0,1,23,124} 


24 13,34 


12, 3|124,4 123 


5 


31 


{0,12,134,234} 


23 14, 24 


13,34|12,34|12 


5 


22 


{0,12,13,24} 


14 23,13 


24,3|124,4|123 


4 


44 


{0,13,23,124} 


13 24,12 


34, 14|23, 4 123 


4 


38 


{0,12,34,1234} 


24 13,24 


13,34 12, 34|12 


4 


11 


{0,1,234} 


2|13,3 


12,3|12,3|12 


6 


23 


{0, 12, 134} 


1 23,2 


13,3 12,3 12 


6 


29 


{0,12,34} 


2 13,2 


13,3 12,3 12 


5 


44 


{0,1234} 


1|2,1|2,1|2,1|2 


6 


23 



Table 1. The 17 non-degenerate CSI split models on ri=4 
binary variables with m < 4 hidden classes, up to symmetry. 



We now come to the classification of CSI split models for n = 4. Each 
model lives in the space C 11 with coordinates ki2, ■ ■ ■ , &34, fci.23, ■ • ■ , ^234, ^1234- 

Proposition 6.3. Up to symmetry, for n=4, there are 380 CSI split models 
satisfying (Al) and (A2). The number of models with m hidden classes is 

0,1,3,13,24,47,55,73,56,50,27,19,6,4,1,1 for m = 1, 2, . . . , 16. 

In Table 1 we list the codimension and degree for all 17 models for m < 4. 

For our classification we used the representation of each CSI split model 
as a hidden subset model Ma, where A C 2", given by Proposition 6.1. 
A choice of A is also displayed for each model in Table 1. Note that two 
distinct models may define the same variety. By symmetry we can assume 
€ A. We first generated the list of all non-degenerate sets A of subsets 
of {1,2,3,4} containing 0, we then computed orbits under the symmetry 
group of the 4-cube, and finally we selected one representative per orbit. To 
compute the codimension c of Ma, we evaluated the rank of the Jacobian of 
the polynomial map (21) at random values of the parameters. By degree in 
Proposition 6.3 we mean the number of complex solutions on Ma of a system 
of 11 — c inhomogeneous linear equations with random coefficients in the 11 
unknowns kj. We used Macaulay2 [ ] to count the number of solutions to 
these equations. It would be desirable to compute the defining prime ideals 
for all models in Proposition 6.3, but we found this to be difficult for m > 5. 
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The 380 models represent a nice suite of test problems for implicitization in 
computer algebra. We close the section with one easy instance. 

Example 6.4. The model Ma with A = {0, 12, 34, 1234} has CSI representa- 
tion {12|34, 12|34, 13 1 24, 13|24}. Its prime ideal in cumulant coordinates is 

(^13^24 — ^14^23, ^13^124 — ^14^123, ^13^234 — ^23^134, fcl4&234 ~ ^24^134, 
^23^124 — &24fcl23, ^23^1234 — ^234^123 + 2fel4fc| 3 , fcl3fcl234 — fcl34&123 + 2fcl4fcl3&23, 
^23^1234 — ^234^124 + 2fci 4 /C24fc23, and fci 4 fcl234 — ^134^124 + 2fci 4 fc 2 3). 

This CSI split model has codimension 4 and degree 11. □ 



7. Semialgebraic geometry of the space of cumulants 

In the previous sections we studied binary cumulant varieties as objects of 
complex algebraic geometry. We examined their dimension, parameterization, 
and defining prime ideal, but we largely ignored the issue that parameters and 
probabilities are real and non-negative. In statistical applications, however, 
it is essential to work over the real numbers and to pay attention to the 
pertinent inequalities. In this section seek to address this omission by asking 
the following fundamental question: Which 2x2x • • • x2-tables K — (kj) IC [n\ 
with entries in the real numbers represent the cumulants of actual probability 
distributions P = (pj)jc[n]? 

Our object of study is the image of the polynomial map A 2 n_i — > 
M 2 _1 taking probability distributions P to their cumulants K . This image 
is denoted JC n . We call it the space of cumulants. The space of cumulants K n 
is a semi- algebraic subset of R 2 _1 . This means that it has a description in 
terms of polynomial inequalities in the hi. We begin by offering a convenient 
representation of these inequalities. 

Proposition 7.1. The space of cumulants K, n is a basic semialgebraic set in 
R 2 . It consists of the solutions of the polynomial inequalities 

J2 Hpj^b)>0 forallJC[n}. (25) 
7ren([n]) Ben 

Proof. The set K n being basic semialgebraic means that it is described by a 
finite conjunction of polynomial inequalities. That conjunction is (25), and 
we shall now prove it. The moment H\... n agrees with the probability P\... n , 
so it is non-negative on /C n . Expressing [i\... n in terms of cumulants as in (6), 

Pi-n = II kB - °- 

7ren([n]) BGtt 

By applying the transformation p,j from (11) to this inequality, we obtain 
Pj c = Pj(Pi---n) > 0. This translates into the inequality (25) in cumulants. 
Since the transformation P i— > K is invertible, we see that K, n has the de- 
sired representation. □ 
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Figure 1. The space of cumulants IC2 is the solution set of (26) 



Example 7.2 (Space of cumulants for n = 2). The probability distributions P 
on the subsets of {1,2} form a tetrahedron, and we map this tetrahedron into 
the 3-space with coordinates (fci, k%, /c 12 ). The image of this map is the space 
of cumulants K-2- Proposition 7.1 gives the semialgebraic representation: 



Inequalities defining A3 
P12 > 
Pi > 

P2 > 

PA > 



Inequalities defining JC2 
ki2 > -kik 2 , 
ku < k 2 (l-k 1 ), 

ki2 < fci(l-Aa). 
ki2 > -(1-Ai)(l-A 2 ). 



(26) 



The solution set of these four quadratic inequalities is depicted in Figure 1. 
In this diagram we see clearly how K,2 arises as a non-linear image of the 
tetrahedron A3. Note that the body JC2 is not convex. The square {0 < 
k\,k2 < 1} in the plane {k\2 = 0} is the image of the independence model 
{P0P12 = P1P2}, which contains four of the six edges of A3. The other two 
edges of the tetrahedron A 3 are the quadratic curves that form the ridges at 
the top and the bottom of the JC2 ■ □ 



Example 7.3 (The space of cumulants for n = 3). We now consider the 
simplex A7 of distributions on subsets of {1,2,3}. Its image in cumulant 
coordinates is the 7-dimensional closed basic semialgebraic set /C3. Both A7 
and /C3 are defined by the constraints that the following eight expressions 
should be non-negative: 

P123 = M123 = k 12 3 + ki2k s + ki Li k2 + k 2 sk! + k!k 2 k 3 

P12 = — M123 + p.12 = — ^123 — k 12 {k 3 — 1) — k 13 k 2 — k 23 kx — ki(k 2 — l)k :i 

P13 = — M123 + Mi3 = — ^123 — k 12 k s — kn(k 2 — 1) — £23^1 — (ki — l)k 2 k 3 

P23 = — M123 + M23 = — ^123 — k 12 k s — k i:i k 2 — k 23 (ki — 1) — (fci — l)fc2& 3 
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Pi = Hvn-Hia—lHz+Hi = k 12 z+k 12 (k 3 -l)+k ia {k 2 -l)+k 2 -. i k 1 +k 1 (k 2 ~l)(k2,~l) 

P2 = ^123-/^12-^23+^2 = fcl23+fcl2 (fcs~ l) + fcl3 k 2 +k 2 3 (fcl - 1) + (fcl — l)fa (fo- 1) 
P3 = ^123-/^13-^23+^3 = fcl23+fcl2fc 3 + fcl3 (&2 - 1)+&23 (fa. - 1) + (fa -1) (fa - l)fa 

P0 = — M123 + /*12 + /*13 + /*23 — Ml — M2 — M3 + 1 

= -fcws - Mfcs-l) ^ faa(fa-l) - fcas(fci-l) - (fa-l)(fa-l)(fa-l) 
Thus the space £3 is defined by eight cubic inequalities in R . □ 

Equipped with the inequality description of IC n we can now try to answer 
questions about the geometry of cumulants of probability distributions. One 
natural such question is to identify the smallest box containing K, n . This is 
equivalent to find a tight upper and lower bound on the possible values of 
the cumulants kj. The following problem was suggested by Gian-Carlo Rota 
and his collaborators in [1]: 

Maximize |fci2 - n | subject to k G JC n . (27) 

In this problem, the absolute value sign around fc 12 ... n can be removed 
because k e JC n implies — k 6 JC n . This is shown in [ , Proposition 3.1], and it 
also follows directly from the symmetries in Corollary 3.6. Let k* denote the 
optimal value of (27). Figure 1 shows that k 2 = 1/4, and one easily derives 
an algebaic proof from the inequalities in Example 7.2. The probability dis- 
tribution p0 = P12 — \ attains k 2 = 1/4. It has been conjectured by Bruno, 
Rota and Torney [1] that the analogous distribution solves the optimization 
problem (27) for all even values of n: 

Conjecture 7.4. [1, bottom of page 16] If n > 2 is an even integer then 

K = * n {\) = (_i)»/ 2 £(_!)<. 7t . n . 

i=l 

This value is attained by the probability distribution pj, = px-.-n — |- 

Here K n is the polynomial in (23). The first values of the bound n n {\) 
are z> gj |> ig) x f° r n = 2, 4, 6, 8, 10. It has been remarked in [1] that 

M^)| ~ 2 4^( 2 ^-!) ! forn»0. 

If n > 3 is an odd integer then K n (h) = 0, and no conjectured value for k* 
has been suggested in [1]. Using recent computational advances in certified 
polynomial optimization, we attacked the problem (27) for n = 3 and n = 
4, thus confirming the conjecture of Bruno, Rota and Torney in the first 
non-trivial case. Namely, we found that the upper bound on cumulants of 
probability set functions satisfies 

K = K = I- (28) 

For n — 3 we used the software Bermeja [ 9] to compute a sums of 
squares certificate via semidefinite programming. We are grateful for the help 



20 



Bcrnd Sturmfels and Piotr Zwicrnik 



provided by Philipp Rostalski. Let us now explain this certificate. We consider 
the following cubic polynomial in the seven moment coordinates \ir. 
1 1 

g - Kl23 = g - M123 + M1M23 + M2J"12 +M3M12 - 2/X1//2M3- 

Our aim is to prove that this polynomial is non-negative on the simplex A7. 
We do this by rewriting the polynomial in the following special form 

^123 = CT0 +0"i^ti +CT 2 /i2 +0"3M3 +0"12/il2 + 0"13Ml3 +^23/^23 +0"l23A l 123, (29) 

o 

where each of the eight multipliers aj is a sum of squares of linear polynomial 
in the moments /ij. Each such sum of squares corresponds to a positive 
scmidchnite quadratic form, and it can be represented by a symmetric 8x8- 
matrix £/ as follows: 

07 = [i-Yii-iF where \i = (1, Mi, M2, Ms, M12, Mi3> M23, Mm)- (30) 

Our certificate for = 1/8 is a tuple (Eg, Si, £2, £3, £12, £13, £23, £123) of 
positive semidefinite symmetric 8x8-matrices such that (29) and (30) hold. 
Finding such a tuple of matrices is an instance of semidefinite programming. 

We attempted to find a similar proof for the second identity k^ = 1/8 
but the computations required turned out to be too difficult so far. The idea 
was to take advantage of the symmetries preserves the optimization problem 
(27). This is a group of order 192, and has index 2 in the symmetry group 
of the 4-cube. Our hope was to use the the dual moment formulation due 
to Riener et al. in [18], but this did yet terminate successfully. Instead, we 
verified the identify k^ = 1/8 by running numerous applications of standard 
implementations of numerical optimization in R and Matlab. Running these 
hill climbing methods from a multitude of different starting values verifies 
the desired result with very high confidence. 



References 

[1] W. Bruno, G. Rota, and D. Torney: Probability set functions, Annals of 

Combinatorics, 3 (1999), pp. 13-25. 
[2] D. Bruynooghe and H. Wynn: Differential cumulants, hierachical models 

and monomial ideals, arXiv: 1102 . 2118. 
[3] M. Drton, B. Sturmfels, and S. Sullivant: Lectures on Algebraic Statis- 
tics, Oberwolfach Seminars, Vol. 39, Birkhauser, Basel, 2009. 
[4] M. J. Duff, String triality, black hole entropy, and Cayley's hyperdeterminant, 

Phys. Rev. D, 76 (2007), 025017, 4. 
[5] I.M. Gel'fand, M.M. Kapranov, and A.V. Zelevinsky: Discriminants, 

Resultants, and Multidimensional Determinants, Birkhauser, 1994. 
[6] B. Georgi and A. Schliep: Context-specific independence mixture modeling 

for positional weight matrices, Bioinformatics, 22 (2006), pp. el66-el73. 
[7] D. Grayson and M. Stillman: Macaulay 2- a. software system for research 

in algebraic geometry, Available at http://www.math.uiuc.edu/Macaulay2/. 
[8] A. Hald, The early history of the cumulants and the Gram-Charlier series, 

International Statistical Review, 68 (2000), pp. 137-153. 



Binary cumulant varieties 



21 



[9] O. Holtz and B. Sturmfels: Hyperdeterminantal relations among symmetric 
principal minors, Journal of Algebra, 316 (2007), pp. 634-648. 
[10] P. Huggins, B. Sturmfels, J. Yu, and D. Yuster: The hyperdetermi- 
nant and triangulations of the 4-cube, Mathematics of Computation, 77 (2008), 
pp. 1653-1680. 

[11] P. McCullagh: Tensor Methods in Statistics, Monographs on Statistics and 
Applied Probability, Chapman & Hall, London, 1987. 

[12] A. Miyake, Classification of multipartite entangled states by multidimensional 
determinants, Phys. Rev. A (3), 67 (2003), 012108, 10. 

[13] L. OedinG: Set-theoretic defining equations of the tangential variety of the 
Segre variety, J. Pure and Applied Algebra, 215 (2011), pp. 1516-1527. 

[14] L. OedinG: Set-theoretic defining equations of the variety of principal minors 
of symmetric matrices, Algebra and Number Theory, to appear. 

[15] L. Pachter and B. Sturmfels: Algebraic Statistics for Computational Biol- 
ogy, Cambridge University Press, 2005. 

[16] G. Pistone and H.P. Wynn: Cumulant varieties, J. Symbolic Computation, 
41 (2006), pp. 210-221. 

[17] C. Raicu: The GSS conjecture, arXiv: 1011 . 5867. 

[18] C. Riener, T. Theobald, L. Jansson Andren and J.B. Lasserre: 
Exploiting symmetries in SDP-relaxations for polynomial optimization, 
arXiv: 1103.0486. 

[19] P. Rostalski: Bermeja- software for convex algebraic geometry, available at 

math . berkeley . edu/ ~philipp/cagwiki. 
[20] G.-C. Rota and S. Roman, The umbral calculus, Adv. Math, 27 (1978), 

pp. 95-188. 

[21] G.-C. Rota and J. Shen: On the combinatorics of cumulants, J. Combin. 
Theory Ser. A, 91 (2000), pp. 283-304. 

[22] A. Schonhuth: Complete identification of binary-valued hidden Markov pro- 
cesses, arXiv: 1101 .3712. 

[23] T.P. Speed: Cumulants and partition lattices, Austral. J. Statistics, 25 (1983), 
pp. 378-388. 

[24] R.P. Stanley: Enumerative Combinatorics. Volume I, no. 49 in Cambridge 
Studies in Advanced Mathematics, Cambridge University Press, 2002. 

[25] J.G. Sumner and P.D. Jarvis: Using the tangle: a consistent construction 
of phylogenetic distance matrices for quartets, Mathematical Biosciences, 204 
(2006), pp. 49-67. 

[26] D.C. Torney: Binary cumulants, Advances in Applied Math, 25, (2000), 
pp. 34-40. 



Bernd Sturmfels 

Department of Mathematics, University of California, Berkeley, CA 94720, USA, 
bernd@math . berkeley . edu 

Piotr Zwiernik 

Institute Mittag-Lemer, 18260 Djursholm, Sweden, piotr.zwiernik@gmail.com 



